Search CORE

236 research outputs found

Nearly Optimal Deterministic Algorithm for Sparse Walsh-Hadamard Transform

Author: Cheraghchi Mahdi
Indyk Piotr
Publication venue
Publication date: 28/04/2015
Field of study

For every fixed constant

\alpha > 0

, we design an algorithm for computing the

k

-sparse Walsh-Hadamard transform of an

N

-dimensional vector

x \in \mathbb{R}^N

in time

k^{1+\alpha} (\log N)^{O(1)}

. Specifically, the algorithm is given query access to

x

and computes a

k

-sparse

\tilde{x} \in \mathbb{R}^N

satisfying

\|\tilde{x} - \hat{x}\|_1 \leq c \|\hat{x} - H_k(\hat{x})\|_1

, for an absolute constant

c > 0

, where

\hat{x}

is the transform of

x

and

H_k(\hat{x})

is its best

k

-sparse approximation. Our algorithm is fully deterministic and only uses non-adaptive queries to

x

(i.e., all queries are determined and performed in parallel when the algorithm starts). An important technical tool that we use is a construction of nearly optimal and linear lossless condensers which is a careful instantiation of the GUV condenser (Guruswami, Umans, Vadhan, JACM 2009). Moreover, we design a deterministic and non-adaptive

\ell_1/\ell_1

compressed sensing scheme based on general lossless condensers that is equipped with a fast reconstruction algorithm running in time

k^{1+\alpha} (\log N)^{O(1)}

(for the GUV-based condenser) and is of independent interest. Our scheme significantly simplifies and improves an earlier expander-based construction due to Berinde, Gilbert, Indyk, Karloff, Strauss (Allerton 2008). Our methods use linear lossless condensers in a black box fashion; therefore, any future improvement on explicit constructions of such condensers would immediately translate to improved parameters in our framework (potentially leading to

k (\log N)^{O(1)}

reconstruction time with a reduced exponent in the poly-logarithmic factor, and eliminating the extra parameter

\alpha

). Finally, by allowing the algorithm to use randomness, while still using non-adaptive queries, the running time of the algorithm can be improved to

\tilde{O}(k \log^3 N)

arXiv.org e-Print Archive

CiteSeerX

Crossref

DSpace@MIT

Spiral - Imperial College Digital Repository

Which Regular Expression Patterns are Hard to Match?

Author: Backurs Arturs
Indyk Piotr
Publication venue
Publication date: 26/09/2016
Field of study

Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an

O(mn)

running time (where

m

is the length of the pattern and

n

is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH. An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g.,

[a|ab|bc]^*

. This corresponds to the so-called {\em word break} problem, for which a dynamic programming algorithm with a runtime of (roughly)

O(n\sqrt{m})

is known. We show that the latter bound is not tight and improve the runtime to

O(nm^{0.44\ldots})

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Sparse recovery using sparse matrices

Author: Piotr Indyk
Piotr Indyk
Radu Berinde
Radu Berinde
Publication venue
Publication date: 01/01/2008
Field of study

We consider the approximate sparse recovery problem, where the goal is to (approximately) recover a high-dimensional vector x from its lower-dimensional sketch Ax. A popular way of performing this recovery is by finding x* such that Ax=Ax*, and ||x*||_1 is minimal. It is known that this approach ``works'' if A is a random *dense* matrix, chosen from a proper distribution.In this paper, we investigate this procedure for the case where A is binary and *very sparse*. We show that, both in theory and in practice, sparse matrices are essentially as ``good'' as the dense ones. At the same time, sparse binary matrices provide additional benefits, such as reduced encoding and decoding time

CiteSeerX

DSpace@MIT

Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform

Author: Indyk Piotr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Sketching via hashing is a popular and useful method for processing large data sets. Its basic idea is as follows. Suppose that we have a large multi-set of elements S=[formula], and we would like to identify the elements that occur “frequently" in S. The algorithm starts by selecting a hash function h that maps the elements into an array c[1…m]. The array entries are initialized to 0. Then, for each element a ∈ S, the algorithm increments c[h(a)]. At the end of the process, each array entry c[j] contains the count of all data elements a ∈ S mapped to j

DSpace@MIT

Crossref

On Approximate Nearest Neighbors under l∞ Norm

Author: Indyk Piotr
Publication venue: Elsevier Science (USA).
Publication date: 31/12/2001
Field of study

AbstractThe nearest neighbor search (NNS) problem is the following: Given a set of n points P={p1, …, pn} in some metric space X, preprocess P so as to efficiently answer queries which require finding a point in P closest to a query point q∈X. The approximate nearest neighbor search (c-NNS) is a relaxation of NNS which allows to return any point within c times the distance to the nearest neighbor (called c-nearest neighbor). This problem is of major and growing importance to a variety of applications. In this paper, we give an algorithm for (4⌈log1+ρlog4d⌉+1)-NNS algorithm in ld∞ with O(dn1+ρlogO(1)n) storage and O(dlogO(1)n) query time. Moreover, we obtain an algorithm for 3-NNS for l∞ with nlogd+1 storage. The preprocessing time is close to linear in the size of the data structure. The algorithm can be also used (after simple modifications) to output the exact nearest neighbor in time bounded by O(dlogO(1)n) plus the number of (4⌈log1+ρlog4d⌉+1)-nearest neighbors of the query point. Building on this result, we also obtain an approximation algorithm for a general class of product metrics. Finally, we show that for any c<3 the c-NNS problem in l∞ is provably as hard as the subset query problem (also called the partial match problem). This indicates that obtaining a sublinear query time and subexponential (in d) space for c<3 might be hard

Elsevier - Publisher Connector